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Abstract 

After a decade of extensive study of the sparse representation synthesis model, 
we can safely say that this is a mature and stable field, with clear theoretical 
foundations, and appealing applications. Alongside this approach, there is an 
analysis counterpart model, which, despite its similarity to the synthesis alter- 
native, is markedly different. Surprisingly, the analysis model did not get a 
similar attention, and its understanding today is shallow and partial. 

In this paper we take a closer look at the analysis approach, better define it 
as a generative model for signals, and contrast it with the synthesis one. This 
work proposes effective pursuit methods that aim to solve inverse problems regu- 
larized with the analysis-model prior, accompanied by a preliminary theoretical 
study of their performance. We demonstrate the effectiveness of the analysis 
model in several experiments. 
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1. Introduction 

Situated at the heart of signal and image processing, data models are fun- 
damental for stabilizing the solution of inverse problems, and enabling various 
other tasks, such as compression, detection, separation, sampling, and more. 
What are those models? Essentially, a model poses a set of mathematical prop- 
erties that the data is believed to satisfy. Choosing these properties (i.e. the 
model) carefully and wisely may lead to a highly effective treatment of the 
signals in question and consequently to successful applications. 

Throughout the years, a long series of models has been proposed and used, 
exhibiting an evolution of ideas and improvements. In this context, the past 
decade has been certainly the era of sparse and redundant representations, a 
novel synthesis model for describing signals [21, 5, 33, 40]. Here is a brief 
description of this model: 

Assume that wc arc to model the signal x G M''. The sparse and redundant 
synthesis model suggests that this signal could be described as x = Dz, where 
D e R*^^" is a possibly redundant dictionary (n > d), and z e R", the signal's 
representation, is assumed to be sparse. Measiuing the cardinality of non-zeros 
of z using the '^o-norm', such that ||z||o is the count of the non-zeros in z, we 
expect II z|| to be much smaller than n. Thus, the model essentially assumes that 
any signal from the family of interest could be described as a linear combination 
of few columns from the dictionary D. The name "synthesis" comes from the 
relation x = Dz, with the obvious interpretation that the model describes a 
way to synthesize a signal. 

This model has been the focus of many papers, studying its core theoretical 
properties by exploring practical numerical algorithms for using it in practice 
(e.g. [10, 32, 7, 11]), evaluating theoretically these algorithms' performance 
guarantees (e.g. [25, 16, 41, 42, 2]), addressing ways to obtain the dictionary 
from a bulk of data (e.g. [22, 1, 30, 38]), and beyond all these, attacking 
a long series of applications in signal and image processing with this model, 
demonstrating often state-of-the-art results (e.g. [20, 18, 28, 34]). Today, after 
a decade of an extensive study along the above lines, with nearly 4000 papers^ 
written on this model and related issues, we can safely say that this is a mature 
and stable field, with clear theoretical foundations, and appealing applications. 

Interestingly, the synthesis model has a "twin" that takes an analysis point of 
view. This alternative assumes that for a signal of interest, the analyzed vector 
fix is expected to be sparse, where e M^'^'' is a possibly redundant analysis 
operator {p> d). Thus, we consider a signal as belonging to the analysis model 
if ||rix||o is small enough. Common examples of analysis operators include: the 
shift invariant wavelet transform rJwT [33]; the finite difference operator f^DiF, 
which concatenates the horizontal and vertical derivatives of an image and is 



^This is a crude estimate, obtained using ISI-Web-of-Science. By first searching 
Topic=(sparse and representation and (dictionary or pursuit or sensing)), 240 papers are 
obtained. Then we consider all the papers that cite the above-found, and this results with 
«3900 papers. 
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closely connected to total variation [36]; the curvelet transform [39], and more. 
Empirically, analysis models have been successfully used for a variety of signal 
processing tasks such as denoising, deblurring, and most recently compressed 
sensing, but this has been done with little theoretical justification. 

It is well known by now [19] that for a square and invertible dictionary, the 
synthesis and the analysis models are the same with D = The models 

remain similar for more general dictionaries, althoiigh then the gap between 
them is unexplored. Despite the close-proximity between the two - synthesis 
and analysis - models, the first has been studied extensively while the second 
has been left aside almost untouched. In this paper we aim to bring justice to 
the analysis model by addressing the following set of topics: 

1. Cosparsity: In Section 2 we start our discussion with a closer look at the 
sparse analysis model in order to better define it as a generative model 
for signals. We show that, while the synthesis model puts an emphasis on 
the non-zeros of the representation vector z, the analysis model draws its 
strength from the zeros in the analysis vector Six. 

2. Union of Subspaces: Section 2 is also devoted to a comparison between 
the synthesis model and the analysis one. We know that the synthesis 
model described above is an instance of a wider family of models, built 
as a finite union of subspaces [29]. By choosing all the sub-groups of 
columns from D that could be combined linearly to generate signals, we 
get an exponentially large family of low-dimensional subspaces that cover 
the signals of interest. Adopting this perspective, the analysis model can 
obtain a similar interpretation. How are the two related to each other? 
Section 2 considers this question and proposes a few answers. 

3. Uniqueness: We know that the spark of the dictionary governs the 
uniqueness properties of sparse solutions of the underdetermined linear 
system Dz = x [16]. Can we derive a similar relation for the analysis 
case? As a platform for studying the analysis uniqueness properties, we 
consider an inverse problem of the form y = Mx, where M G K^xi^ and 
m < d, and y G is a measurement vector. Put roughly (and this will 
be better defined later on), assuming that x comes from the sparse anal- 
ysis model, could we claim that there is only one possible solution x that 
can explain the measurement vector y? Section 3 presents this uniqueness 
study. 

4. Pursuit Algorithms: Armed with a deeper understanding of the anal- 
ysis model, we may ask how to efficiently find x for the above-described 
linear inverse; problem. As in the synthesis case, we can consider either 
relaxation-based methods or greedy ones. In Section 4 we present two 
numerical approximation algorithms: a greedy algorithm termed "Greedy 
Analysis Pursuit" (GAP) that resembles the Orthogonal Matching Pur- 
suit (OMP) [32] - adapted to the analysis model -, and the previously 
considered ^^-minimization approach [19, 37, 9]. Section 5 accompanies 
the presentation of GAP with a theoretical study of its performance guar- 
antee, deriving a condition that resembles the ERG obtained for OMP 
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[41]. Similarly, wc study the terms of success of the £i-minimization ap- 
proach for the analysis model, deriving a condition that is similar to the 
one obtained for the synthesis sparse model [41]. 
5. Tests: In Section 6 wc demonstrate the effectiveness of the analysis model 
and the pursuit algorithms proposed in several experiments, starting from 
synthetic ones and going all the way to a compressed-sensing test for an 
image based on the analysis model: the Shepp Logan phantom. 

Wc believe that with the above set of contributions, the cosparsc analysis model 
becomes a well-defined and competitive model to the synthesis counterpart, 
equipped with all the necessary ingredients for its practical use. Furthermore, 
this work leads to a series of new questions that are parallel to those studied for 
the synthesis model - developing novel pursuit methods, a theoretical study of 
pursuit algorithms for handling other inverse problems, training Ct just as done 
for D, and more. We discuss these and other topics in Section 7. 

Related Work. Several works exist in the literature that are related to the anal- 
ysis model. The work by Elad et. al. [19] was the first to observe the dichotomy 
of analysis and synthesis models for signals. Their study, done in the context of 
the Maximum-A-Posteriori Probability estimation, presented the two alterna- 
tives and explored cases of equivalence between the two. They demonstrated a 
superiority of the analysis-based approach in signal denoising. Further empirical 
evidence of the effectiveness of the analysis-based approach can be found in [35] 
and [37] for signal and image restoration. In [37] it was noted that the nonzero 
coefficients play a different role in the analysis and synthesis forms but the im- 
portance of the zero coefficients for the analysis model - which is reminiscent of 
signal characterizations through the zero-crossings of their undecimated wavelet 
transform [31] - was not explicitly identified. 

More recently, Candes et al. [9] provided a theoretical study on the error 
when the analysis-based ^i-minimization is used in the context of compressed 
sensing. Our work is closely related to these contributions in various ways, and 
we shall return to these papers when diving into the details of our study. 

2. A Closer Look at the Cosparse Analysis Model 

Wc start our discussion with the introduction of the sparse analysis model, 
and the notion of cosparsity that is fundamental for its definition. We also 
describe how to interpret the analysis model as a generative one (just like the 
synthesis counterpart). Finally, wc consider the interpretation of the sparse 
analysis and synthesis models as two manifestations of union-of-subspaces mod- 
els, and show how they are related. 

2.1. Introducing Cosparsity 

As described in the introduction, a conceptually simple model for data 
would be to assume that each signal we consider can be expressed (i.e., well- 
approximated) as a combination of a few building atoms. Once we take this 
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view, a simple synthesis model can be thought of: First, there is a collection 
of the atomic signals {djjj^i G R'^ that we concatenate as the columns of a 
dictionary, denoted by D e M''^". Here, typically n > d, implying that the 
dictionary is redundant. Second, the signal x G K** can be expressed as a linear 
combination of some atoms of D, thus there exists z G M" such that x = Dz. 
Third and most importantly, x must lie in a low dimensional subspace, and in 
order to ensure this, very few atoms are used in the expression x — Dz, i.e., the 
number of non-zeros ||z||o is very small. By the observation that ||z||o is small, 
we say that x has a sparse representation in D. The number k = ||z||o is the 
sparsity of x. 

Often, the validity of the above described sparse synthesis model is demon- 
strated by applying a linear transform to a class of signals to be processed and 
observing that most of the coefficients are close to zero, exhibiting sparsity. 
In signal and image processing, discrete transforms such as wavelet, Gabor, 
curvelet, contourlet, shearlet, and others [33, 39, 13, 27], are of interest, and 
this empirical observation seems to give a good support for the sparse syn- 
thesis model. Indeed, when aiming to claim optimality of a given transform, 
this is exactly the approach taken - show that for a (theoretically-modeled) 
class of signals of interest, the transform coefficients tend to exhibit a strong 
decay. However, one cannot help but noticing that this approach of validat- 
ing the synthesis model seems to actually validate another 'similar' model; we 
are considering a model where the signals of interest have sparse analysis rep- 
resentations. This point is especially pronounced when the transform used is 
over-complete or redundant. 

Let us now look more carehilly at the above mentioned model that seems 
to be similar to the sparse synthesis one. First, let CI G W^''- be a signal 
transformation or an analysis operator. Its rows are the row vectors {wj}^^-^ 
that will be applied to the signals. Applying Q, to x, we obtain the (analysis) 
representation fix of x. To capture various aspects of the information in x, we 
typically have p> d. 

For simplicity, unless stated otherwise, we shall assume hereafter that all the 
rows of CI are in general position, i.e., there are no non-trivial linear dependen- 
cies among the rows."^ 

Clearly, unless x = 0, no representation Jlx can be 'very sparse', since at 
least p — d of the coefficients of fix are necessarily non-zeros. We shall put our 
emphasis on the number of zeros in the representation, a quantity we will call 
cosparsity. 

Definition 1. The cosparsity of a signal x e M'' with respect to CI G W^'^ (or 
simply the cosparsity ofx) is defined to be: 

Cosparsity: ^ := p — ||nx||o (1) 



^Put differently, we assume that the spark of the matrix Sl^ is full, implying that every 
set of d rows from CI are linearly independent. 
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The index set of the zero entries of Ox is cahed the cosupport of x. We say that 
X has cosparse representation or x is cosparse when the cosparsity of x is large, 
where by large we mean that i is close to d. We will see that, while ^ < for an 
analysis operator in general position, there are specific examples where i may 
exceed d. 

At first sight the replacement of sparsity by cosparsity might appear to be 
mere semantics. However we will see that this is not the case. In the synthesis 
model it is the columns dj,j £ T associated with the index set T of nonzero 
coefficients that define the signal subspace. Removing columns from D not in 
T leaves this subspace unchanged. In contrast, it is the rows ujj associated with 
the index set A such that (wj,x) = 0,j € A that define the analysis subspace. 
In this case removing rows from for which {ojj , x) ^ leaves the subspace 
unchanged. 

From this perspective, the cosparse model is rather related to signal charac- 
terizations from the zero-crossings of their undecimated wavelet transform [31] 
than to sparse wavelet expansions. 

2.2. Sparse Analysis Model as a Generative Model 

In a Bayesian context, one can think of data models as generators for random 
signals from a pre-specified probability density function. In that context, the 
signals that satisfy the fc-sparse synthesis model can be generated as follows: 
First, choose k columns of the dictionary D at random (e.g. assuming a uniform 
probability). We denote the index set chosen by T, and clearly \T\ = k. Second, 
form a coefficient vector z that is fc-sparse, with zeros outside the support T. 
The k non-zeros in z can be; chosen at random as well (e.g. Gaussian iid entries). 
Finally, the signal is created by multiplying D to the resulting sparse coefficient 
vector z. 

Could we adopt a similar view for the cosparse analysis model? The answer 
is positive. Similar to the above, one can produce an ^-cosparse signal in the 
following way: First, choose £ rows of the analysis operator ft at random, and 
those are denoted by an index set A (thus, |A| = £). Second, form an arbitrary 
signal V in R** - e.g., a random vector with Gaussian iid entries. Then, project 
V to the orthogonal complement of the subspace generated by the rows of CI 
that are indexed by A, this way getting the cosparse signal x. Alternatively, 
one could first find a basis for the orthogonal complement and then generate a 
random coefficient vector for the basis. 

This way, both models can be considered as generators of signals that have 
a special structure, and clearly, the two signal generators are different. It is now 
time to ask how those two families of signals inter-relate. In order to answer 
this question, we take the union-of-subspaces point of view. 

2.3. Union-of-Subspaces Models 

It is well known that the sparse synthesis model is a special instance of a 
wider family of models called union-of-subspaces [29, 4]. Given a dictionary D, 
a vector z that is exactly /c-sparse with support T leads to a signal x = Dz = 
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Dt'^t, a linear combination of k columns from D. The notation denotes 
the sub-matrix of D containing only the columns indexed by T. Denoting 
the subspace spanned by these columns by Vt := span(dj,j e T), the sparse 
synthesis signals belong to the union of all (^) possible subspaces of dimension 
k, 

Sparse Synthesis Model: x € UT:\T\=k ^t- (2) 

Similarly, the analysis model is associated to a union of subspaces model 
as well. Given an analysis operator fJ, a signal that is exactly ^-cosparse with 
respect to the rows A from ft is simply in the orthogonal complement to these 
£ rows. Thus, we have^ JIax = 0, which implies that x G Wa, where Wa := 
spa,n{ojj,j € A)-"- = {x, (wj, x) =0,Vje A}. Put differently, we may write 
Wa = Range(nJ)-'" = Null(nA). Hence, cosparse analysis signals x belong to 
the union of all the possible such subspaces of dimension d — £, 

Cosparse Analysis Model: x e Ua:|a|=« VVa- (3) 

The following table summarizes these two unions of subspaces, where we recall 
that we consider n and D in general position. 



Model 


Subspaces 


No. of Subspaces 


Subspace dimension 


Synthesis 


Vt :=span(dj,j G T) 






k 


Analysis 


Wa := span(wj, j e A)-"- 




*^ 


d-£ 



What is the relation between these two union of subspaces, as described in 
Equations (2) and (3)? In general, the answer is that the two are different. An 
interesting way to compare between the two models is to consider an ^-cosparse 
analysis model and a corresponding {d — ^)-sparse synthesis model, so that the 
two have the same dimension in their subspaces. 

Following this guideline, we consider first a special case where £ = d—1. In 
such a case, the dimension of the analysis subspaces is — ^ = 1, and there are 
(^) of those. An equivalent synthesis union of subspaces can be created, where 
k = 1. We should construct a dictionary D with n = (^) atoms dj, where 
each atom is the orthogonal complement to one of the sets of £ rows from fi. 
While the two models become equivalent in this case, clearly n 3> p in general, 
implying that the sparse synthesis model becomes untractable since D becomes 
too large. 

By further assuming that p = d, wc get that there are exactly (^) = {^''^i) = 
d subspaces in the analysis union, and in this case n = p = d as well. Further- 
more, it is not hard to see that in this case the synthesis atoms are obtained 
directly by a simple inversion, D = 



^Note that the notation SIa refers to restricting rows from CI indexed by A, whereas in 
the synthesis case we have taken the columns. We shall use this convention throughout this 
paper, where from the context it should be clear whether rows or columns are extracted. 
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Adopting a similar approach, considering the general case where £ is a general 
value (and not necessarily d—1), one could always construct a synthesis model 
that is equivalent to the analysis one. We can compose the synthesis dictionary 
by simply concatenating all the bases for the orthogonal complements to the 
subspaces Wa. The obtained dictionary will have at most (c? — atoms. 
However, not all supports of size k are allowed in the obtained synthesis model, 
since otherwise the new sparse synthesis model will strictly contain the cosparse 
analysis one. As such, the cosparse analysis model may be viewed as a sparse 
synthesis model with some structure. 

Further on the comparison between the two models, it would be of benefit to 
consider again the case d — £ — k (i.e., having the same dimensionality), assume 
that p = n (i.e., having the same overcompleteness, for example with O ~ D"^), 
and compare the number of subspaces amalgamated in each model. For the 
sake of simplicity we consider a mild overcompleteness of p = n = 2d. Denoting 
H{t) := — tlog2t — {1 — t) log2(l — t), < i < 1, the number of subspaces of 
low dimension k <ti d — n/2 in each data model, from Stirling's approximation, 
roughly satisfies for large d: 

Synthesis: log2 ^ safe - logj ^ 

Analysis: logj ^ n ■ H ^ ^ ss n ■ H{0.5) = n. 

More generally, unless d/nfn 1, there are much fewer low- dimensional synthesis 
subspaces than the number of analysis subspaces of the same dimension. This 
is illustrated on Figure 2.3 when n — p = 2d. This indicates a strong difference 




Figure 1: Number of subspaces of a given dimension, for n = p = 2d. The solid blue curve 
shows the log number of subspaces for the synthesis model as the dimension of subspaces vary, 
while the dashed red curve shows that for the analysis model. 

in the structure of the two models: The synthesis model includes very few 
low-dimensional subspaces, and an increasingly large number of subspaces of 
higher dimension; and the analysis model contains a combinatorial number of 
low-dimensional subspaces, with fewer high dimensional subspaces. 
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Comment: One must keep in mind that the huge number of low-dimensional 
subspaces, though rich in terms of its descriptive power, makes it very difficult to 
recover algorithmically signals that belong to the union of those low-dimensional 
subspaces or to efficiently code/sample those signals (see the experimental re- 
sults in Section 6.1). This stems from the fact that in general, it is not possible 
to get cosparsity d < £ < p: any vector x that is orthogonal to d linearly inde- 
pendent rows of rj must be the zero vector, leading to an uninformative model. 
One may, however, get cosparsities in the range d < i < p when the analysis 
operator ft displays certain linear dependencies. Therefore it appears to be de- 
sirable, in the cosparse analysis model, to have analysis operators that exhibit 
highly linearly dependent structure. We will see in Section 3.4 that a leading 
example of such operators is the finite difference analysis operator. 

Another interesting point of view towards the difference between the two 
models is the following: While a synthesis signal is characterized by the support 
of the non-zeros in its representation in order to define the subspace it belong 
to, a signal from the analysis model is characterized by the locations of the zeros 
in its representation fix. The fact that this representation may contain many 
non-zeroes (and especially so when p^ d) should be of no consequence to the 
efficiency of the analysis model. 

2.4- Comparison with the Traditional Sparse Analysis model 

Previous work using analysis representations, both theoretical and algorith- 
mic, has focussed on gauging performance in terms of the more traditional 
sparsity perspective. For example, in the context of compressed sensing, recent 
theoretical work [9] has provided performance guarantees for minimum ^^-norm 
analysis representations in this light. 

The analysis operator is generally viewed as the dual frame for a redundant 
synthesis dictionary so that f2 = D^. This means that the analysis coefficients 
Clx provide a consistent synthesis representation for x in terms of the dictionary 
D, implying that the representation fix is a feasible solution to the linear system 
of equations Dz = x. 

Furthermore, if ||Jlx||o = p — i, then J2x must be an element of the fc-sparse 
synthesis model, lJT |T|=fe ^t, with k = p — £. Hence: 

{0} C IJ Wa C (J Vt C R'^. (4) 

A:|A|=p-fc T:\T\=k 

Of course, 17x is not guaranteed to be the sparsest representation of x in terms 
of D. Hence the two subspace models are not equivalent. 

Note that while in Section 2.3 the sparsity k was matched to d — £, here it 
is matched to p — £. The former was used to get the same dimensions in the 
resulting subspaces, while the match discussed here considers the vector fix as 
a candidate fc-sparse representation. 

Such a perspective treats the analysis operator as a poor man's sparse syn- 
thesis representation. That is, for certain signals x, the representation fix may 
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be reasonably sparse but is unlikely to be as sparse as, for example, the minimum 
£^-norm synthesis representation'*. 

In the context of linear inverse problems, it is tempting to try to exploit 
the nesting property (4) in order to derive identifiability guarantees in terms of 
the sparsity of the analysis coefficients fix. For example, in [9], the compressed 
sensing recovery guarantees exploit the nesting property (4) by assuming a suffi- 
cient number of observations to achieve a stable embedding (restricted isometry 
property) for the /c-sparse synthesis union of subspaces, which in turn implies a 
stable embedding of the {p — fc)-cosparse analysis union of subspaces. 

While such an approach is of course valid, it misses a crucial difference 
between the analysis and synthesis representations: they do not correspond to 
equivalent signal models. Treating the two models as equivalent hides the fact 
that they may be composed of subspaces with markedly different dimensions. 
The difference between these models is highlighted in the following examples. 

2.4-1- Example: generic analysis operators, p = 2d 

Assuming the rows of ft are in general position, then when p > 2d the 
nesting property (4) is trivial but rather useless! Indeed, if fc < rf, then the only 
analysis signal for which ||r2x||o = fc=p — i'isx = 0. Alternatively, if A; > d, 
the synthesis model is trivially the full space: UT:|T|=fc^T = 

2-4-2- Example: shift invariant wavelet transform 

The shift invariant wavelet transform is a popular analysis transform in 
signal processing. It is particularly good for processing piecewise smooth signals. 
Its inverse transform has a synthesis interpretation as the redundant wavelet 
dictionary consisting of wavelet atoms with all possible shifts. 

The shift invariant wavelet transform [33] provides a nice example of an 
analysis operator that has significant dependencies due to the finite support of 
the individual wavelets. Such nontrivial dependencies within the rows of S^wT 
mean that the dimensions of the (analysis or synthesis) signal subspaces are not 
easily characterised by either the sparsity k or the cosparsity £. However the 
behaviour of the model is still driven by the zero coefficients not the nonzero 
ones, i.e., by the zero-crossings of the wavelet transform [31]. By considering 
a particular support set of an analysis representation Hwtx with the shift 
invariant wavelet transform we can illustrate the dramatic difference between 
the analysis and synthesis interpretations of the coefficients. 

Figure 2 shows the support set of the nonzero analysis coefficients, associated 
with the cone of influence around a discontinuity in a piecewise polynomial signal 
of length 128-samples [17], using a shift-invariant Daubechies wavelet transform 
with s = 3 vanishing moments [33]. For such a signal, the cone of influence at 



''When measuring sparsity with an norm, < p < 1, rather than with p = 0, it has been 
shown [26] that for so-called localized frames the analysis coefficients fix obtained with CI = 
Dt the canonical dual frame of D are near optimally sparse: ||f2x||p < CpTxiin^^j^^—^ 
where the constant Cp does not depend on x. 
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level J in a shift invariant wavelet transform contains Lj — 1 nonzero coefficients 
where Lj is the length of the wavelet filter at level j. Note though, the nonzero 
coefhcients are not linearly independent and can be elegantly described through 
the notion of wavelet footprints [17]. 

Synthesis perspective. Interpreting the support set within the synthesis 
model implies that the signal is not particularly sparse and needs a significant 
number of wavelet atoms to describe it: in Figure 2 the size of the support 
set, excluding coefficients of scaling functions, is 122. Could the support set 
be significantly reduced by using a better support selection strategy such as £^ 
minimization? In practice, using £^ minimization, a support set of 30 can be 
obtained, again ignoring scaling coefficients. 

Analysis perspective. The analysis interpretation of the shift invariant 
wavelet representation relies on the examination of the size of the analysis sub- 
space associated with the cosupport set. From the theory of wavelet footprints, 
the dimension of this subspace is equal to the number of vanishing moments of 
the wavelet filter, which in this example is only ... 3, providing a much lower 
dimensional signal model. 

We therefore see that the analysis model has a much lower number of degrees 
of freedom for this support set, leading to a significantly more parsimonious 
model. 




Figure 2: The support set for the wavelet coefBcients of a piecewise quadratic signal using 
a J = 4 level shift invariant Daubechies wavelet transform with s = 3 vanishing moments. 
Scaling coefficients are not shown. The support set contains 122 coefficients out of a possible 
512, yet the analysis subspace has a dimension of only 3. 



2.5. Hybrid Analysis/Synthesis models? 

In this section we have demonstrated that while both the cosparse analysis 
model and the sparse synthesis model can be described by a union of subspaces 
these models are typically very different. We do not argue that one is inevitably 
better than the other. The value of the model will very much depend on the 
problem instance. Indeed the intrinsic difference between the models also sug- 
gests that it might be fruitful to explore building other union of subspace models 
from hybrid compositions of analysis and synthesis operators. For example, one 
could imagine a signal model where x = Dz through a redundant synthesis 
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dictionary but instead of imposing sparsity on z wc restrict z through an ad- 
ditional analysis operator: ||i^z||o < fc. In such a case there wiU still be an 
underlying union of subspace model but with the subspaces defined by a combi- 
nation of atoms and analysis operator constraints. A special case of this is the 
split analysis model suggested in [9]. 

3. Uniqueness Properties 

In the synthesis model, if a dictionary D is redundant, then a given signal x 
can admit many synthesis representations z, i.e., z with Dz = x. This makes the 
following type of problem interesting in the context of the sparse signal recovery: 
When a signal has a sparse representation z, can there be another representation 
that is equally sparse or sparser? This problem is wcll-undcrstood in terms of 
the so-called spark of D [16], the smallest number of columns from D that are 
linearly dependent. 

Unlike in the synthesis model, if the signal is known, then its analysis repre- 
sentation r2x with respect to an analysis operator CI is completely determined. 
Hence, there is no inherent question of uniqueness for the cosparse analysis 
model. The uniqueness question we want to consider in this paper is in the 
context of the noiseless linear inverse problem. 



where M € M™^'^, and m < d, implying that the measurement vector y € M™ is 
not sufficient to fully characterize the original signal x G K.''. For this problem we 
ask: when can we assert that a solution x with cosparsity £ is the only solution 
with that cosparsity or more? The problem (5) (especially, with additive noise) 
arises ubiquitously in many applications, and we shall focus on this problem 
throughout this paper as a platform for introducing the cosparse analysis model, 
its properties and behavior. Not to complicate matters unnecessarily, we assume 
that all the rows of M are linearly independent, and we omit noise, leaving 
robustness analysis to further work. 

For completeness of our discussion, let us return for a moment to the synthe- 
sis model and consider the uniqueness property for the inverse problem posed in 
Equation (5). Assuming that the signal's sparse representation satisfies x = Dz, 
we have that y — Mx ~ MDz. Had we known the support T of z, this linear 
system would have reduced to y = MDyz^, a system of m equations with k 
unknowns. Thus, recovery of x from y is possible only if A; < m. 

When the support of z is unknown, it is the spark of the compound matrix 
MD that governs whether the cardinality of zt is sufficient to ensure uniqueness 
- if k = ||z||o is smaller than half the spark of MD, then necessarily z is the 
signal's sparsest representation. At best, sj)arfc(MD) = m + 1, and then we 
require that the number of measurements is at least twice the cardinality k. Put 
formally, we require 



y = Mx, 



(5) 




m + 1 



2 



(6) 
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It will be interesting to contrast this requirement with the one we will derive 

hereafter for the analysis model. 



y 




M" 










3.1. Uniqueness When the Cosupport is Known 

Before we tackle the uniqueness problem for the analysis model, let us con- 
sider an easier question: Given the observations y obtained via a measurement 
matrix M, and assuming that the cosupport A of the signal x is known, what 
are the sufficient conditions for the recovery of x? The answer to this question 
is straightforward since x satisfies the linear equation 

= Ax. (7) 

To be able to uniquely identify x from Equation (7), the matrix A must have a 
zero null space. This is equivalent to the requirement 

Null(f2A) n Null(M) = Wa n Null(M) ^ {0}. (8) 

Let us now assume that M and CI are mutually independent, in the sense 
that there are no nontrivial linear dependencies among the rows of M and 
fl; this is a reasonable assumption because first, one should not be measuring 
something that may be already available from ft, and second, for a fixed 17, 
mutual independency holds true for almost all M (in the Lebesgue measure). 
Then, (8) would be satisfied as soon as dimCWA) + dim(Null(M)) < d, or 
dim(WA) < m, since dim(Null(M)) = d — m. This motivates us to define 

Knii) ■■= max dim(WA)- (9) 

\\\>e 

The quantity Kf2(£) plays an important role in determining the necessary 
and sufficient cosparsity level for the identification of cosparse signals. Indeed, 
under the assumption of the mutual independence of and M, a necessary 
and sufficient condition for the uniqueness of every cosparse signal given the 
knowledge of its cosupport A of size £ is 

< m. (10) 

3.2. Uniqueness When the Cosupport is Unknown 

The uniqueness question that we answered above refers to the case where 
the cosupport is known, but of course, in general this is not the case. We 
shall assume that we may only know the cosparsity level which means that 
our uniqueness question now becomes: what cosparsity level I guarantees that 
there can be only one signal x matching a given observation y? 

As we have seen, the cosparse analysis model is a special case of a general 
union of subspaces model. Uniqueness guarantees for missing data problems 
such as (5) with general union of subspace models are covered in [29, 4]. In 
particular [29] shows that M is invcrtiblc on the union of subspaces \J^i^yS^ if 
and only if M is invertible on all subspaces S-y + Sg for all j,0 € F. In the 
context of the analysis model this gives the following result whose proof is a 
direct consequence of the results in [29] : 
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Proposition 2 ([29]). Let UaWa. |A| = £ be the union of l-cosparse analysis 
subspaces induced by the analysis operator CI. Then the following statements 
are equivalent: 

1. If the linear system y = Mx admits an (.-cosparse solution, then this is 

the unique £-cospa,rse solution; 

2. M is invertible on UaWa; 

3. (Wai + WaJ n Null(M) ^ for any \Ki\, jAaj > £; 

Proposition 2 answers the question of uniqueness for cosparse signals in the 
context of hnear inverse problems. Unfortunately, the answer we obtained still 
leaves us in the dark in terms of the necessary cosparsity level or necessary 
number of measurements. In order to pose a clearer condition, we use Propo- 
sition 2 from [29] that poses a sharp condition on the number of measurements 
to guarantee uniqueness (when M and f2 are mutually independent): 

m>Kn{(), where «;n(^) := max {dim(WAi +WA2) : |Ai| > ^, i = 1, 2} 

(11) 

Interestingly, a sufficient condition can also be obtained using the quantity net 
defined in (9) above, which was observed to play an important role in the unique- 
ness result when the cosupport is assumed to be known. Namely, we have the 
following result. 

Proposition 3. Assume that Kn{() < ^- Then for almost all M (wrt the 
Lebesgue measure), the linear inverse problem, y = Mx has at most one i- 
cosparse solution. 

Proof. Assuming the mutual independence of Jl and M, which holds for almost 
all M, we note that the uniqueness of £ cosparse solutions holds if and only 
if: dim(WAi + Was) < fn, whenever |Aj| > i = 1, 2. Assume that Kn{£) < 
m/2. By definition of ko, if [A^j > £, i = 1,2, then dim(WAi) < hence 

dim(WAi + WA2) < "I- □ 

In the synthesis model the degree to which columns are interdependent can 
be partially characterized by the spark of D [16] defined as the the smallest 
number of columns of D that are linearly dependent. Here the function 
plays a similar role in quantifying the interdependence between rows in the 
analysis model. 

Remark 4. The condition ko(^) < ^ is in general not necessary while condi- 
tion (11) is. 

There are two classes of analysis operators for which the function is 

well-understood: analysis operators in general position and the finite difference 
operators. We discuss the uniqueness results for these two classes in the follow- 
ing subsections. 



14 



3.3. Analysis Operators in General Position 

It can be easily checked that Kn(^) = max((i — ^,0). This enables us to 
quantify the exact level of cosparsity necessary for the uniqueness guarantees: 

Corollary 5. Let S W^'^ be an analysis operator in general position. Then, 
for almost all m x d matrix M, the following hold: 

• Based on Eq. (10), if m > d — i, then the equation y = Mx has at most 
one solution with known cosupport A (of cosparsity at least I ); 

• Based on Proposition 2, if m > 2{d — t), then the equation y = Mx has 
at most one solution with cosparsity at least £. 

3.4- The Finite Difference Operator 

An interesting class of analysis operators with significant linear dependencies 
is the family of finite difference operators on graphs, I^dif- These are strongly 
related to TV norm minimization, popular in image processing applications [36], 
and has the added benefit that wc arc able to quantify the function ko. and hence 
the uniqueness properties of the cosparse signal model under I^dif- 

We begin by considering Odif on an arbitrary graph before restricting our 
discussion to the 2D lattice associated with image pixels. Consider a non- 
oriented graph with vertices V and edges E cV'^. An edge e is a pair e = {vi,V2) 
of connected vertices. For any vector of coefficients defined on the vertices, 
X G M^, the finite difference analysis operator JIdif computes the collection of 
differences {x{vi) — x{v2)) between end-points, for all edges in the graph. Thus, 
an edge e G E may be viewed as a finite difference on R^. 

Can we estimate the function Ksn-Dip{£)'^ The following shows that it is 
intimately related to topological properties of the graph. For each sub-collection 
A c E of edges, we can define its vertex-set V{A) c V as the collection of 
vertices covered by at least one edge in A. The support set ^(A) of A can be 
decomposed into J(A) connected components (a connected component is a set 
of vertices connected to one another by a walk through vertices in A). It is easy 
to check that a vector x belongs to the space Wa = Null(r2A) if and only if its 
values are constant on each connected component. As a result, the dimension 
of this subspace is given by 

dim(WA) = |y|-|y(A)| + J(A) 

where the \V\ — \V{A)\ vertices out of V are associated to arbitrary values in 
X that are distinct from all their neighbors, while all entries from each of the 
J(A) connected components have an arbitrary common value. It follows that 

KoW = max - \V{A)\ + J(A)} = \V\ - min {IV-CA)! - J(A)} (12) 

Because of the nesting of the subspaces Wa, the minimum on the right hand 
side is achieved when |A| = i. 
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Uniqueness Condition for Cosparse Images with respect to the 2D Odif- In the 
abstract context of general graph the characterization (12) may remain obscure, 
but can we get more concrete estimates by specializing to the 2D regular graph 
associated to the pixels of an x image? It turns out that one can obtain 
relatively simple upper and lower bounds for khuif ^^'^ hence derive an easily 
interpretable uniqueness condition (see Appendix C for a proof): 

Proposition 6. Let fluiF be the finite difference analysis operator that com- 
putes horizontal and vertical discrete derivatives of a d = N x N image. For 
any £ we have 

rf-^-/|-l<«J^o:pW<rf-^- (13) 

As a result, assuming that M is 'mutually independent' from CtuiF, we have: 

• Based on Eq. (10), if m>d — £/2, that is to say 

e>2d- 2m, (14) 

then the equation y = Mx has at most one solution with known cosupport 

A (of cosparsity at least £ ); 

• Based on Proposition 2, if m> 2{d — £/2) =2d — £, that is to say 

£>2d-m, (15) 

then the equation y = Mx has at most one solution with cosparsity at 

least £. 

Note that as soon as the matrix M is associated to an underdetermined linear 
system, i.e., when m < d, we need £ > 2d — m > d to exploit the uniqueness 
guarantee (15). 

The 2D rioiF, Piecewise Constant Images, and the TV norm,. The 2D finite 
difference operator is closely related to the TV norm [36]: the discrete TV norm 
of X is essentially a mixed £'^ — £^ norm of riDirx. Just like its close cousin TV 
norm minimization, the minimization of j|r2x||o is particularly good at inducing 
piecewise constant images. We illustrate this through a worked example. 

Consider the popular Shepp Logan phantom image shown in left hand side 
of Figure 3. This particular image has 14 distinct connected regions of constant 
intensity. The number of non-zero coefficients in the finite difference representa- 
tion is determined by the total length (Manhattan distance) of the boundaries 
between these regions. For the Shepp Logan phantom this length is 2546 pixel 
widths and thus the cosparsity is = 130560 — 2546 = 128014. Furthermore, as 
there are no isolated pixels with any other intensity, all pixels belong to a con- 
stant intensity region so that |V(A)| = \ V\ and the cosupport has an associated 
subspace dimension of: 

dim(WA) = (|V^|-|l^(A)|) + J(A) 
= 14 
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Figure 3: An example of a piecewise constant image: the 256 X 256 Shepp Logan phantom 
(left); and an image with the same cosparsity, I = 128014, but whose cosupport is associated 
with an empirically maximum subspace dimension (right). 



In order to determine when the Shepp Logan image is the unique solution 
to y = Mx with maximum cosparsity it is necessary to consider the maximum 
subspace dimension of all possible support sets with the same cosparsity. This is 
the quantity measured by The right hand image in Figure 3 shows an 

image with equal copsparsity but whose support is associated with the highest 
dimensional subspace we could find: dim(WA) = 1276. Comparing this to the 
bounds given in (13) of Proposition 6 

1270< < 1524, 

suggests that the lower bound is reasonably tight in this instance. Note, as 
explained in Appendix C, this image has a single connected subgraph, A, which 
is nearly square. The uniqueness result from Proposition 6 then tells us that 
a sufficient number of measurements to uniquely determine the Shepp Logan 
image is given by m = 2KnDiF (128014) which is somewhere between 2552 (if our 
empirical estimate is accurate) and 3048 (worst case). 

Wc will revisit this again in Subsection 6.2 where wc investigate the empirical 
recovery performance of some practical reconstruction algorithms. 

3.5. Overview of cosparse vs sparse models for inverse problems 

To conclude this section, Figure 4 provides a schematic overview of analy- 
sis cosparsc! models vs synthesis sparse models in the context of linear inverse 
problems such as compressed sensing. In the synthesis model, the signal x is 
a projection (through the dictionary D) of a high-dimensional vector z living 
in the union of sparse coefficient subspaces; in the analysis model, the signal 
lives in the pre-image by the analysis operator ft of the intersection between 
the range of fl and this union of subspaces. For a given sparsity of z, this is 
usually a set of much smaller dimensionality. 

4. Pursuit algorithms 

Having a theoretical foundation for the uniqueness of the problem 

x = argmin ||r2x||o subject to Mx = y, (16) 
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Coefllcient Domain 




Figure 4: A schematic overview of analysis cosparsc vs synthesis sparse models in relation 
with compressed sensing. 



we turn now to the question of how to solve it: algorithms. In this section we 
present two algorithms, both targeting the solution of problem (16). As in the 
uniqueness discussion, we assume that M S M™^'', where m < d. This implies 
that the equation Mx = y has infinitely many possible solutions, and the term 
||r2x||o introduces the analysis model to regularize the problem. 

The first algorithm we present, the analysis £i-minimization, is well-known 
and widely used already in practice, see e.g. [20, 40]. The other algorithm 
we discuss is a variant of well-known greedy pursuit algorithm used for the 
synthesis model - the Orthogonal Matching Pursuit (OMP) algorithm. Similar 
to the synthesis case, our goal is to detect the informative support of fix - 
as discussed in Section 3.1, in the analysis case, this amounts to the locations 
of the zeros in the vector Jlx, so as to introduce additional constraints to the 
underdetermined system Mx — y. Note that for obtaining a solution, one needs 
to detect at least d — to of these zeros, and thus if £ > d — to, detection of the 
complete set of zeros is not mandatory. Of course, there can be many more 
possibilities to solve (16) or to find approximate solutions of it. We mention a 
few works where some of such methods can be found: [35, 37, 6]. 

4-1. The Analysis £i -minimization 

Solving (16) can be quite difficult. In fact, the synthesis counterpart of (16) 
is known to be NP-hard in general. As is well-known, a very effective way to 
remedy this situation is to modify (16) and to solve: 

x = argmin ||rix||i subject to Mx = y. (17) 
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The attractiveness of this approach comes from that (17) is a convex problem 
and hence admits computationally tractable algorithms to solve it, and that the 
£i-norm promotes high cosparsity in the solution x. An algorithm that targets 
the solution of (17) and its convergence analysis can be found in [6]. 

4-2. The Greedy Analysis Pursuit Algorithm (GAP) 

The algorithm wc present in this section is named Greedy Analysis Pursuit 
(GAP). As mentioned at the beginning of the section and as the name suggests, 
this algorithm aims to find the cosupports of cosparse signals in a greedy fashion. 

An obvious way to find the cosupport of a cosparse signal would proceed 
as follows: First, obtain a reasonable estimate of the signal from the given 
information. Using the initial estimate, select a location as belonging to the 
cosupport. Having this estimated part of the cosupport, we can obtain a new 
estimate. One can now see that by alternating the two previous steps, we will 
have identified enough locations of the cosupport to get the final estimate. 

However, the; GAP works in an opposite direction and aims to detect the 
elements outside the set A, this way carving its way towards the detection of 
the desired cosupport. Therefore, the cosupport A is initialized to be the whole 
set {1, 2, 3, ... and through the iterations it is reduced towards a set of 
size i (or less, d — m). 

Let us discuss the algorithm with some detail. First, the GAP uses the 
following initial estimate: 

xo = argmin ||nx||2 subject to y = Mx. (18) 

X 

Not knowing the locations of the cosupport but knowing that many entries of 
fixo are zero, this is a reasonable first estimate of xq. Once we have xq, we 
can view JIxq as an estimate of J7xo. Hence, we find the location of the largest 
entries (in absolute value) of J7xo and regard them as not belonging to the 
cosupport. After this, we remove the corresponding rows from ft and work with 
a reduced $7. A detailed description of the algorithm is given in Figure 5. 

Some readers may notice that the GAP has similar flavors to the FOCUSS 
[23] and the IRLS [12]. This is certainly true in the sense that the GAP solves 
constrained least squares problems and adjusts weights as it iterates. However, 
the weight adjustment in the GAP is more aggressive (removal of rows) and 
binary in nature. 

Stopping criterion / targeted sparsity. In GAP, we debate between using the 
full £ zeros in the product fix versus a minimal and sufficient set of d — m 
zeros. In between these two values, and assuming that the proper elements of 
A have been detected, we expect the solution obtained by the algorithms to be 
the same, with a slightly better numerical stability for a larger number of zeros. 

Thus, an alternative stopping criterion for the GAP could be to detect 
whether the solution is static or the analysis coefficients of the solution are 
small. This way, even if the GAP made an error and removed from Afc an index 
that belongs to the true cosupport A, the tendency of the solution to stabilize 
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xa.SK. Mpproxirnatc tne solution or ^lo^. 


# 


Pcirameters; Given are the matrices IVI, fi, the vector y, the tsrget 




number of zeros £, and a selection factor t G (0, 1]. 


• 


Initialization: Set fc = and perform the following steps: 




— Initialize Cosupport: Afc = {1,2,3, ... ,p}, 




— Initialize Solution: 




Xfc = argmin x|j2 subject to y = Mx. 


• 


GAP Iterations: Increment fc by 1 and perform the following steps: 




— Project: Compute a = r2xfc_i, 




— Find largest entries: Ffc = {i : \ai\ > tmauXj \aj\}, 




— Update Support: = Afc_i \rfc, and 




— Update Solution: 




Xfc = argmin ||f2^ x||2 subject to y = Mx. 




— stopping Criterion: \f k > p — d + m (or k > p ~ £), stop. 


• 


Output: The proposed solution is xqap = x^ obtained after k iterations. 



Figure 5; Greedy Analysis Pursuit Algoritlim (GAP) 



could help in preventing the algorithm to incorporate this error into the solution. 
In fact, this criterion is used in the experiment in Section 6. 

Multiple selections.. The selection factor < t < 1 allow the selection of mul- 
tiple rows at once, to accelerate the algorithm by reducing the number of iter- 
ations. 

Solving the required least squares problems. The solution of Eq. (18) (and of 
the adjusted problems with reduced at subsequent steps of the algorithm) is 
given analytically by 



"M 


t 


y 










In practice, instead of (18), we compute 

Xo = argmin {||y - Mx||| + A||f2x||2} = argmin 





y 




M 


X 
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for a small A > 0, yielding the solution 



5. Theoretical analysis 

So far, we have introduced the cosparse analysis data model, provided unique- 
ness results in the context of linear inverse problems for the model, and described 
some algorithms that may be used to solve such linear inverse problems to re- 
cover cosparse signals. Before validating the algorithms and the model pro- 
posed with experimental results, we first investigate theoretically under what 
conditions the proposed algorithms to solve cosparse signal recovery (16) are 
guaranteed to work. After that, we discuss the nature of the condition derived 
by contrasting it to that for the synthesis model. Further discussion including 
some desirable properties of CI and M can be found in Appendix D. 

5.1. A Sufficient Condition for the Success of the ii -minimization 

In the sparse synthesis framework, there is a well-known necessary and suf- 
ficient condition called the null space property (NSP) [15] that guarantees the 
success of the synthesis ^i-minimization 

zo := argmin ||z||i subject to y = *z (19) 

z 

to recover the sparsest solution, say zq, to y = *z. To elaborate, in the case 
of a fixed support T, the €i-minimization (19) recovers every sparse coefficient 
vector Zq supported on T if and only if 

||zt||i < ||zTc||i, Vz e Null(*), z ^ 0. (20) 

The NSP (20) cannot easily be checked but some 'simpler' sufficient conditions 
can be derived from it; for example, one can get a recovery condition of [41] 
called the Exact Recovery Condition (ERC): 

|||*5.*Tc|||i^i <1, (21) 

which also implies the success of greedy algorithms such as OMP [41]. Note that 
here we used the symbol # for an object which may be viewed as a dictionary or 
a measurement matrix. Separating the data model and sampling, we can write 
$ = MD as was done in Section 3. 

One may naturally wonder: is there a condition for the cosparse analysis 
model that is similar to (20) and (21)? The answer to this question seems to be 
affirmative with some qualification as the following two results show (the proofs 
are in Appendix A): 
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Theorem 7. Let A be a fixed cosupport. The analysis ii -minimization 

xo := argmin ||rix||i subject to y := Mxq = Mx (22) 

X 

recovers every Xq with cosupport A as a unique minimizer if, and only if, 

sup |(nAcz,sign(nAcXA))| < llf^Azlli, Vz € Null(M), z ^ 0. (23) 

XA:nAXA=0 

Corollary 8. Let be any dx (d—m) basis matrix for the null space Null(M), 
and A be a fixed cosupport such that the £ x {d — m) matrix HaN"^ is of full 
rank d — m. If 

sup ||(NnJ)tNnXcSign(nAcXA)||oo < 1, (24) 

XA:nAXA=0 

then the analysis £i-minimization (22) recovers every Xq with cosupport A. 
Moreover, if 

|||(Nf2l)tNf2l.||U^oo = |||f^A=N^(f^AN^)^|||i^i < 1 (25) 

then condition (24) holds true. 

There is an apparent similarity between the analysis ERC condition (25) 
above and its standard synthesis counterpart (21), yet there are some subtle 
differences between the two that will be highlighted in Section 5.3. 

5.2. A Sufficient Condition for the Success of the GAP 

There is an interesting parallel between the synthesis ERC (21) and its anal- 
ysis version in Corollary 8; namely, the analysis ERC condition (25) also implies 
the success of the GAP algorithm, as we will now show. 

From the way GAP algorithm works, we can guarantee that it will perform 
a correct elimination at the first step if the largest analysis coefficients of Oa^xo 
of the first estimate xq are larger than the largest of OaXq where A denotes 
the true cosupport of xq. This observation suggests that we can hope to find 
a condition for success if we can find some relation between nA<=xo and HaXq. 
The following result provides such a relation: 

Lemma 9. Let be any dx {d — m) basis matrix for the null space Null(M) 
and A be a fixed cosupport such that the £ x {d — m) matrix HaN-^ is of full 
rank d — rn. Let a signal xq with ^Ia^o = and its observation y = Mxq be 
given. Then the estimate xq in (18) satisfies 

HaXo = -(NOX)tNnJenAcXo. (26) 

Having obtained a relation between HaXo and HacXq, we can derive a suf- 
ficient condition which guarantees the success of GAP for recovering the true 
target signal xq: 
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Theorem 10. Let N'^ he any dx (d—m) basis matrix for the null space Null(]V[) 
and A be a fixed cosupport such that the £ x (d — m) matrix 17aN-^ is of full 
rank d — m. Let a signal xq with ^axo = and an observation y = Mxq he 
given. Suppose that the analysis ERC (25) holds true. Then, when applied to 
solve (16), GAP with selections factor t > |||(NnJ)'l'NnJc||| oo->.c>o will recover 
Xq after at most |A^| iterations. 

Proof. At the first iteration, GAP is doing the correct thing if it removes a row 
from Cl\c. Clearly, this happens when 

||f^AXo||oo < tWClAc (27) 

In view of (26), if (25) holds and t > |||(NriJ)tNf2j<, |||oo^oo, then (27) is 
guaranteed. Therefore, GAP successfully removes a row from il^c at the first 
step. 

Now suppose that (25) was true and GAP has removed a row from ilj^c at 
the first iteration. Then, at the next iteration, we have the same JIa and, in 
the place of Ct\o, a submatrix Oa= of flA'= (with one fewer row). Thus, we can 
invoke Lemma 9 again and we have 

Oaxi = - (NnJ)^NJ7jci7AcXi. 

Let Ro := (NfiJ)^ Nfjjc and Ri := (NHJ)'^ NU^^. We observe that Ri is a 
submatrix of Rq obtained by removing one column. Therefore, 

^^lllloo— >oo ^ lll^^ollloo— >oo — ^* 

By the same logic as for the first step, the success of the second step is guaran- 
teed. Repeating the same argument, we obtain the conclusion. □ 

Remark 11. As pointed out at the beginning of the subsection, the Exact Re- 
covery Condition (25) for the cosparse signal recovery guarantees the success of 
both the GAP and the analysis £i-minimization. 

5.3. Analysis vs synthesis exact recovery conditions 

When * is written as MD, the exact recovery condition (21) for the sparse 
synthesis model is equivalent to 

|||(MDt)^MDtc|||i^i < 1. (28) 

Here, T is the support of the sparsest representation of the target signal. At 
first glance, the two conditions (28) and (25): 

|||f2AcN^(f2AN^)t|||i^i < 1 

look similar; that is, for both cases, one needs to understand the characteristics 
of a single matrix, fiN"^ for the cosparse model, and MD for the sparse model. 
Moreover, the expressions involving these matrices have similar forms. 
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However, upon closer inspection, there is a crucial difference in the structures 
of the two expressions. In the synthesis case, the operator norm in question 
depends only on how the columns of MD are related, since a more explicit 
writing of the pseudo-inverse shows that the matrix to consider is 

(D^M^MDt)"^(MDt)'^MDtc 

This fact allows us to obtain more easily characterizable conditions like inco- 
herence assumptions [41] that ensure condition (28). 

To the contrary, in the analysis case, more complicated relations among the 
rows and the columns of ON^ have to be taken into account. The matrix to 
consider being 

the inner expression NJIJOaN-^ is connected with how the columns of fiN-^ 
are related. However, because the matrices flAcN"^ and NJlJ appear outside, 
it also becomes relevant how the rows of ilN'^ are related. 

There is also an interesting distinction in terms of the sharpness of these 
exact recovery conditions. Namely, the violation of (28) implies the failure of 
the OMP in the sense that there exist a sparse vector x = D^zt for which the 
first step of OMP picks up an atom which is not indexed by T. To the opposite, 
the violation of (25) does not seem to imply the necessary "failure" of GAP in 
a similar sense. 

Note however that both conditions are not essential for the success of the 
algorithms. One of the reasons is that the violation of the conditions does not 
guarantee that the algorithms would select wrong atoms. Furthermore, even if 
the GAP or the OMP "fails" in one step, that does not necessarily mean that 
the algorithms fail in the end: further steps may still enable them to achieve an 
accurate estimate of the vector xq. 

5.4- Relation to the Work by Candes et. al. [9] 

Before moving onto experimental results, we discuss the recovery guarantee 
result of Candes et al. [9] for the algorithm 

X = argmm ||D'^x||i subject to ||Mx-y||2<e (29) 

when partial noisy observation y = Mx -|- w with ||w||2 < e is given for an 
unknown target signal x. 

In order to derive the result, the concept of D-RIP is introduced [9]: A 
measurement matrix M satisfies D-RIP adapted to D with constant 5^ if 

(l-OlMli<l|Mt;||i<(l + Olbll2 

holds for all v that can be expressed as a linear combination of s columns of D. 
With this definition of D-RIP, the main result of [9] can be stated as follows: 
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For an arbitrary tight frame D and a measurement matrix M satisfying D-RIP 
with d^g < 0.6, the solution x to (29) satisfies 



||x - x||2 < 



<C„. + C."°"' (30) 



where the constants Co and Ci may depend only on 5^^, and the notation (c)^ 
represents a sequence obtained from a sequence c by keeping the s-largest values 
of c in magnitude (and setting the other to zero). 

The above recovery guarantee is one of the few — very likely the only — results 
existing in the literature on (29). However, we observe that there is much room 
for improving the result. We now discuss why we hold this view. For clarity 
and for the purpose of comparison to our result, we consider only the case e = 
for (29). 

First, we note that [9] implicitly uses the estimate of type ||nAoz||i < ||r2Az||i 
for (23). Hence, the main result of [9] cannot be sharp in general due to the 
fact that the sign patterns of (23) are ignored^ 

Second, the quality of the bound ||D^x— (D-^x)s||i/y^ in (30) is measured 
in terms of how effective D^x is in sparsifying the signal x with respect to the 
dictionary D. To explain, let us consider the synthesis ^i-minimization 

Ai(x) := argmin||z||i subject to MDz = Mx (31) 

and let Aq (x) be the sparsest representation of x. Applying the standard result 
for the synthesis ^i-minimization, we have 

||Ar(x)-Ao(x)H.<c J'^°W-(^°("»-ll^ 

provided that MD satisfies the standard RIP with, e.g., 623 < \/2— 1 ~ 0.414. 
Since D is a tight frame, it is equivalent to 

||DA,(x) - x|b < cJ|Ao(x)-(Ao(x)).||,_ 

Note that both Aq (x) and D^x are legitimate representations of x since DAq (x) = 
X = DD^x. Thus, Ao(x) is sparser than D^x in general; in this sense, 
D^x is not effective in sparsifying x. Given this, we expect that ||Ao(x) — 
(Ao(x))s||i/yi is smaller than ||D^x - (D^x)s||i/v^. We now see that (30) 
with e = and (32) are of the same form. Furthermore, given the degree of 
restriction on the RIP constants {S^s < 0.6 vs. 52s < 0.414), we can only ex- 
pect that the constant C2 is smaller than Ci. From these considerations, (30) 



*Note that the same lack of sharpness holds true for our results based on (25), yet we will 
see that these can a<;tually provide cosparse signal recovery guarantees in simple but nontrivial 
cases. 
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only lets us to conclude that analysis -minimization (17) performs on par with 
synthesis £i-minimization (31), or tends to perform worse. 

Third, the nature of the formulation in (30) takes the view that the cosparse 
signals are the same as the sparse synthesis signals as described in Section 2.4. 
Due to this, the only way for (30) to explain that the cosparse signals are 
perfectly recovered by analysis .^i -minimization is to show that D^x is exactly 
s-sparsc for some ,s > with D-RIP constant 5^^ < 0.6. Unfortunately, we 
can quickly observe that the situation becomes hopeless even for moderately 
overcomplete D; for example, let D be a 1.15-times overcomplete random tight 
frame for and consider recovering [d — l)-cosparsc signals for the operator 
D^. Note that (d— l)-cosparse signals x lead to (0.15d-|-l)-sparse representation 
D^x. This means that we need 61^^ i^d+i) ~ ^?.05d+7 *o ^e smaller than 0.6 to 
show that x can be recovered with analysis £i, which of course cannot happen 
since 6^ > 1. By taking the synthesis view of the signals, (30) cannot explain 
the recovery of the simplest cosparse signals (cosparsity — 1) no matter what 
M is (as long as it is under-determined). 

We also observe that the result of [9] cannot say much about the recovery of 
cosparse signals with respect to the finite difference operators Odif discussed 
in Section 3. This is due to the fact that O^jp is not a tight frame. How docs 
our recovery result (25) fare in this regard? For illustration, we took il to be 
the finite difference operator JIdif for 32 x 32 images (thus, d = 1024). As a 
test image, we took x to be constant in the region : i, j = 1, . . . , 16} and 

{{hj) • = 1) ■ • ■ ) 16}"^. For this admittedly simple test image, we computed 
the operator norm in (25) for random measurement matrices M G ]R640x1024 
When the operator norm was computed for 100 instances M, it was observed to 
be less than 0.726. Hence, our result does give the guarantee of cosparse signal 
recovery in simple cases. 

6. Experiments 

Empirical performance of the proposed algorithms is presented in this sec- 
tion. First, we show how the algorithms perform in synthetic cosparse recovery 
problems. Second, experimental results for an analysis-based compressed sens- 
ing are presented. 

6.1. Performance of analysis algorithms 

In this section, we apply the algorithms described in Section 4 to synthetic 

cosparse recovery problems. In the experiment, the entries of M G K^x^^ were 
drawn independently from the normal distribution. For the analysis operator 
e M^^'', it was constructed so that its transpose is a random tight frame with 
unit norm columns — we will simply say that $7 is a random tight frame in this 
case.^ Next, the co-sparsity £ was chosen, and the true or target signal x was 



^One could also construct f2 by simply drawing the rows of it randomly and independently 
from S''"^ without the tight frame constraint. We have run the experiment for such operators 
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generated randomly as described in Section 2.2. The observation was obtained 
by y = Mx. 

We have used Matlab cvx package [24] with the precision set to best for 

the analysis-i'i. For the final results, wc used the estimate x from f.i solver to 
obtain an estimate of the cosupport — the cosupport estimate was obtained by 
taking the indices for which the corresponding analysis coefficient is of size less 
than 10~^ — and then using this cosupport and the observation y to compute 
the final estimate of x (this process can be considered as d(!-biasing.). 




"5" " " " ' " " " " "5" " " " ' " " " " "5" 

Figure 6: Recovery Rate of Analysis Algorithms for d = 200. The figures correspond to GAP 
(top) and LI (bottom) with u = 1 (left), a = 1.2 (center) and cr = 2 (right). 

Figure 6 shows the results. In all cases, the signal dimension d is set to 200. 

We then varied the number m of measurements, the co-sparsity £ of the target 
signal, and the operator size p according to the following formulae: 

m = 5d, i = d — pm, p = ad. 

which is consistent with Donoho & Tanner's notations for phase transition dia- 
grams [14]: S = m/d is the undersampling ratio, and p = {d — £)/m measures 
the relative dimension of the ^-cosparse subspaces compared to the number of 
measures. For every fixed parameter triplet ((7,S,p), the experiment was re- 
peated 50 times. A relative error of size less than 10~^ was counted as perfect 
recovery. Each pixel in the diagrams corresponds to a triplet (cr, 6, p) and the 
pixel intensity represents the ratio of the signals recovered perfectly with white 
being the 100% success. 

The figures show that the GAP can be a viable option when it comes to the 
cosparsc signal recovery. What is a bit unexpected is that GAP performs better 
than l!i-minimization, especially for overcomplete 17's. Yet, it should be clear 



and observed that the result was similar. 
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from its description that GAP has polynomial complexity, and it is tractable in 
practice. 

An interesting phenomenon observed in the plots for overcomplete Cl is that 

there seems to be some threshold (5, such that if the observation to dimension 
ratio 6 is less than 5*, one could not recover any signal however cosparse it 
may be. We may explain this heuristically as follows: If m measurements are 
available, then the amount of information wc have for the signal is cim. whore ci 
is the number of bits each observation represent. In order to recover a cosparse 
signal, we need first to identify which subspace the signal belongs to out of (^) , 
and then to obtain the d — £ coefficients for the signal with respect to a basis of 
the d — £ dimensional subspace. Therefore, roughly speaking, one may hope to 
recover the signal when 

cim > log2 +ci{d-e)= log2 + pciro. 

Thus, the recovery is only possible when (1 — p)6 > loga {j)/{ci<i)- Using 
the relation p = ad and Stirling's approximation, this leads to an asymptotic 
relation 

5 > (1 - p)<5 > ^^^i^^i^^llM^, 

Cl 

which explains the phenomenon. 

The calculation above and the experimental evidence from the figures con- 
firm the intuition we had in Section 2.3: The combinatorial number of low- 
dimensional cosparse subspaces arising from analysis operators in general po- 
sition is not desirable. This strengthens our view on the necessity of design- 
ing/learning analysis operators with high linear dependencies. 

6.2. Analysis-based Compressed Sensing 

We observed in Section 6.1 that the cosparse analysis model facilitates eff'ec- 

tivc algorithms to recover partially observed cosparse signals. In this section, 
we demonstrate the effectiveness of GAP algorithm on a standard toy problem: 
the Shepp Logan phantom recovery problem. 

Wc consider the following problem that is related to computed tomography 
(CT): There is an image, say of size nxn, which we are interested in but cannot 
observe directly. It can only be observed indirectly by means of its 2D Fourier 
transform coefficients. However, due to high cost of measurements or some 
physical limitation, the Fourier coefficients can only be observed along a few 
radial lines. These limited observations or the locations thereof can be modeled 
by a measurement matrix M, and with the obtained observation we want to 
recover the original image. As an ideal example, we consider the Shepp Logan 
phantom. One can easily see that this image is a good example of cosparse 
signals in rioiF which consists of all the vertical and horizontal gradients (or 
one step differences). This image has been used extensively as an example in 
the literature in the context of compressed sensing (see, e.g., [8, 3]). 
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Figure 7 is the result obtained using GAP. The number of measurements 
that corresponds to 12 radial lines is m = 3032. Compared to the number 
of pixels in the image d = 65536, it is approximately 4.63%. The number of 
analysis atoms that give non-zero coefficients is p — £ = 2546. The size of 
J7dif is roughly twice the image size d = 65536, namely p = 130560. At first 
glance, this corresponds to very high co-sparsity level {£ = 130560 — 2546), or 
put differently, given the high cosparsity level i = 128014, wc seem to have 
required too many measurements. However, using the conjectured near optimal 
necessary condition for uniqueness guarantee (13), we may have uniqueness 
guarantee when m > 2551. Also, using the sufficient condition (15), one would 
want to have m > 3058 measurements. In view of this, the fact that GAP 
recovered the signal perfectly for 3032 measurements is remarkable! 




Figure 7: Recovery of 256 X 256 Shepp Logan phantom image. Prom top to bottom, left to 
right: (a) Original Image, (b) Sampling locations of Fourier coefficients, (c) Reconstructed 
image, (d) Locations where one-step difference of the original image is non-zero. Upper half 
corresponds to the horizontal differences and lower half the vertical differences, (c) Locations 
that GAP identified/eliminated to be the ones where the differences arc likely non-zero, (f) 
Locations that GAP failed to identify as non-zero locations. Blank black figure indicates that 
none of the non-zero locations were missed (perfect reconstruction) . 

We have also ran the GAP algorithm for a larger sized 512 x 512 problem. 
The results (not shown here) are visually similar to Figure 7. In this case, 
the number of measurements (m = 7112) represents approximately 2.71% of 
the image size {d ~ 262144). The number of non-zero analysis coefficients is 
p — £ = 5104. The sufficient uniqueness condition (15) gives m > 6126 as a 
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number of measurements for the uniqueness. 

Remark 12. Due to the large size of these problems, GAP algorithm as de- 
scribed in Section 4 had to be modified: We used numerical optimization to 
compute pseudo-inverses. Also, due to high computational cost, we eliminated 
many rows at each iteration (super greedy) instead of one. Although this was not 
implemented using a selection factor, this can be interpreted as using varying 
selection factors < tk < 1 along the iterations. 

To conclude this section, we have repeated the 256 x 256 Shepp Logan phan- 
tom image recovery problem for several algorithms while varying the number of 
radial observation lines. Given that we know the minimal theoretical number 
and a theoretically sufficient number of radial observation lines for the unique- 
ness guarantee, the experimental result gives us an insight on how various al- 
gorithms actually perform in the recovery problem in relation to the amount of 
observation available. Figure 8 shows the outcome. The algorithms used in the 
experiment are the GAP, the TV-minimization from llmagic, the AIHT from 
[3], and the back-projection algorithm.^ The GAP and llmagic can be viewed 
as analysis-based reconstruction algorithms while the AIHT is a synthesis-based 
reconstruction algorithm. The AIHT is seen to use Haar wavelets as the synthe- 
sis dictionary, hence the algorithm implicitly assumes that the phantom image 
has sparse representation in that dictionary. We remark that while Figure 8 
gives an impression that the AIHT does not have any improvement over the 
baseline back-projection algorithm, perfect reconstructions were observed for 
the former when sufficient measurements were available, which is not the case 
for the latter. 
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Figure 8; SNR vs the number of radial observation lines in 256 X 256 Shepp Logan phantom 
image recovery. The output line for the GAP is clipped due to high SNR value. 



^The code for llmagic was downloaded from http://www.acm.caltech.edu/llmagic/ and 
the one for AIHT from http: / /www. personal . sotoii.ac.uk/tblm08/sparsify/AlHT_Paper_Code . zip. 

The result for the back-projection was obtained using the code for AIHT. 
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Remark 13. It must he noted that in our experiment, eaeh radial line consists 
of N pixels for an N x N image; this is in contrast to the fact that the radial 
lines in the existing codes, e.g. llmagic, have N — 1 pixels. We have made 
appropriate changes for our experiment. The radial lines with N — 1 pixels do 
make the recovery problem more difficult and more observations were required 
for perfect recovery for the GAP. 

7. Conclusions and Further Work 

In this work, wc have described the cosparse analysis data model as an al- 
ternative to the popular sparse synthesis model. By the description, we have 
shown that the cosparse analysis model is distinctly diflFerent from the sparse 
synthesis one in spite of their apparent similarities. In particular, treating the 
cosparse model as the synthesis model by assuming that the analysis representa- 
tions of cosparse signals are sparse was demonstrated to be not very meaningful. 
Having had presented the model, we have stated conditions that guarantee the 
uniqueness of cosparse solutions in the context of linear inverse problems based 
on the work [29] . We then presented some algorithms for the cosparse recovery 
problem and provided some theoretical result for the analysis i'l-minimization 
and the newly proposed GAP. Lastly, the model and the proposed algorithm 
were validated via experimental results. 

Although our work in this paper shows that the cosparse analysis model 
together with algorithms based on the model is an interesting subject to study 
and viable for practical applications, there are much more to be learned about 
the model. Among possible future avenues for related research, we list the 
following: 1) The stability of measurement matrices M on the analysis union 
of subspaces UaWa; 2) The effect of noise on the cosparse analysis model and 
associated algorithms; 3) The designing / learning of analysis operators for 
classes of signals of interest; 4) More concrete and/or optimal theoretical success 
guarantees for algorithms, with a better understanding of the role of linear 
dependencies between rows of the analysis operator. 

Appendix A. Proof of Theorem 7 and Corollary 8 

Let us begin with the simplest case. For a fixed xq with cosupport A, the 
analysis ^i-minimization (22) recovers xq as the unique minimizer if and only if 

|(nAeZ,sign(nAeXo))| < ||f2Az||i, Vz € Null(M), z ^ 0. 

This follows from two facts: a) the above condition characterizes strict local 
minima of the optimization problem; b) the optimization problem is convex and 
can have at most one strict local minimum, which must be the unique global 
optimum. From this, we derive the following: The analysis ^i-minimization (22) 
recovers xq as a unique minimizer for any xq with cosupport A, if and only if 

sup |(nAcZ,sign(JlA-XA))| < ||r2Az||i, Vz e NuU(M), z ^ 

XA:riAXA=0 
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and the proof of Theorem 7 is complete. 

To obtain Corollary 8, observe that we can remove the constraint z e 
Null(M) by writing z = N-^a where is an d x (d — m) basis matrix for 
Null(M) and a E M.'^~™ is an appropriate coefficient sequence. Thus, the nec- 
essary and sufficient condition becomes 

sup |(f2AcN'^a,sign(J2AcXA))| < ||JlAN^a||i, Va e M''-'", a ^ 0. 

XA:nAXA=0 

(A.1) 

Since the £ x [d — m) matrix OaN^ is thin (£> d — m) and full-rank, defining 
P := Cl\'N'^a, we have a = (J7aN^)'I'/?. Therefore, a sufficient (but no longer 
necessary) recovery condition for analysis ^i-minimization is 

sup |(J2AcN^(nAN^)^/3,sign(nAcXA))| < ||/3||i, e ^ 0. 

XA:nAXA=0 

(A.2) 

Equivalently, for all xa with Haxa = 0, 

sup |(/3,(NJ2X)tNJ2XeSign(f2AcXA))| < 1 (A.3) 

ll/3||i = l 

that is to say 

sup ||(NnI)tNJ2jeSign(J2AcXA)||oo < 1. (A.4) 

XA:f2AXA=0 

Condition (24) follows from the above. To conclude the proof of Corollary 8, 
we note that since || sign(rZA<=XA)||oo = 1, the left hand side of (A.4) is bounded 
above by 

|||(NflJ)tNllIe|||oo^oo = |||f2AeN^(f2AN^)^|||i^i. 
Therefore, condition (25) implies (24) and the proof is complete. 

Appendix B. Proof of Lemma 9 

Since Xq is the solution of argmiux llf^xjH subject to y = Mx, applying the 
Lagrange multiplier method, we observe that xq satisfies 

n'^flxo = M'^v and Mxq = y, 

for some v G M™. From the first equation, we obtain v = (M^)tfl^f2xo. 
Putting this back in, one gets (id - M'^(M^)'f) Sl'^Clxo = 0. The last equation 
can be written as (N'^)'^Nn'^rixo = 0, where (N'^)^ is the pseudo-inverse of 
N^. Thus, 

NJi^fixo = 0. 
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Now, we split fl-^Cl = O^Ha + fl]^c^A'^ and write 

NnJnAXo = -NfiJcfiAcXo. 
Since Haxq = 0, we can also write 

noJoau = -NnJ.rjA<=xo (B.i) 

with u = xo — xq. On the other hand, from Mxq = y = Mxq, we have Mu = 0. 
This means that u can be expressed as u =: N^w for some w. Plugging this 
into (B.I), we have 

NnJriAN'^w = -NnJcHAcxo- 

Hence, w = - (NJlXflAN"^)"^ NnJcfiA-xo. This gives us 

Xo - Xo = u = -N^ (NflJfiAN^)"^ NflJcfiAcXo. 
Again, using J^aXo = 0, we have 

Haxo = -HaN^ (NnJnAN^)"^NnJcnA=xo = -(NnJ)tNnJcnA=xo- 

Appendix C. Proof of Proposition 6 

All the statements in this section arc about a 2D regular graph consisting of 
d = N X N vertices {V) and the vertical and horizontal edges (E) connecting 
these vertices. To prove the proposition, we will start with two simple lemmas. 

Lemma 14. For a fixed i, the value 

a(£) := min {\V(A)\ - J (A)} 

is achieved for a subgraph (V(A), A) — we will simply identify A with the subgraph 
from here on — satisfying |A| = ^ and J(A) = 1. 

Proof. It is not difficult to check that the minimum if achieved for A with | A| = £. 
Thus, we will assume |A| = £. 

Now, we need to show that there is also a A with J (A) = 1. Suppose that 
A with |A| = £ achieves a{£) and J(A) > 1. We will show that we can obtain 
A from A that also achieves the value a{£), and |A| = £ and J(A) = 1. For 
simplicity, we will consider the case J(A) = 2 only; one can deal with other 
cases by the repetition of the same argument. 

Let Ai and A2 be the two connected components of A. Note that on a 
2D regular graph, we can shift a subgraph horizontally or vertically unless the 
subgraph has vertices on all four boundaries of V. Since Ai and A2 arc discon- 
nected, not all of them can have vertices on all four boundaries of V. Therefore, 
one of them, say Ai, can be shifted towards the other. Let us consider the first 
moment when they touched each other. Let t be the number of vertices that 
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coincided. Then, at most t — 1 edges must have coincided. Thus, denoting the 
number of edges coincided hy s < t, the resulting subgraph A' has |1^(A)| — t 
vertices and |A| — s edges and one connected components. Now let A be a sub- 
graph obtained from A' by adding s additional edges that are connected to A'. 
Then, 

\V{A)\<\VCA')\+s<\VCA)\-t + s, 
|A| = |A| = ^, and J(A) = 1. Hence, 

|F(A)|- J(A) < \V{A)\-t + s-l = \V{A)\-J{A)-t + s + l< \V{A)\-J{A), 

which is what we wanted to show. □ 

For the next lemma, let us define the degree 5a{v) of a vertex v £ V{A): 

Sa{v) :=\{e€ A:v€ e}\ 

where v G e signifies that v is a vertex of the edge e. That is, 6\{v) is the 
number of edges in A that start/end at v. 

Lemma 15. For a non-empty A c E, 

MV{A)\> SA{v) + i 

vev{A) 

holds. 

Proof. On a 2D regular grid, ^a(^^) < 4. Therefore, we have 

4|y(A)|> ^^i^)- 
veviA) 

Since the equality above would hold if and only if 6a{v) = 4 for all v G V{A), 
the claim of the lemma can be proved by showing that there are at least two 
vertices v with Sa{v) < 2. For this, we consider two 'extreme corner points' of 
A. Let fNW be the north-west corner point of A in the sense that 1) there is no 
vertex v G V{A) that is above it, and 2) there is no vertex v (E V{A) that is left 
of v-Mw cLnd on the same level (height). Let zjse be the south-east corner point 
of A defined similarly. By definition, (5a(?^nw) < 2 and 5a(wse) < 2, and t;NW 
and vs^ are distinct vertices if A 0. □ 

Proof of Proposition 6. We will first prove the upper bound. Clearly, 

E W = 2|A|. 

vev{A) 

By Lemma 15, we also have 

4mA) I > E ^A(t;)+4 
vev{A) 



34 



Hence, we have 

|y(A)|>^ + i (c.i) 

By Lemma 14, the value of Kfjoipl^) given by Eq. (12) is attained for A with 
J(A) = 1 and |A| = Combining this with (12) and (C.I) we get 

«n.,p(^) < IV^I - (inA)l - 1) < IV^I - |A|/2 = d-^. 

The proof of the lower bound is given in Lemma 16. □ 

Before moving on to Lemma 16, we give a brief motivation for it. Our goal 
is to obtain not just a lower bound on Kn^iF but a lower bound that is close 
to optimal. By Lemma 14, Kn-oipi^) is achieved for connected A, so we will 
consider such A's only ( J(A) = 1). With J(A) = 1, the formula (12) tells us to 
look for the cases when |V^(A)| is minimal in order to compute kodifI^)- 

What is the shape of the collection of edges A yielding the minimum ? 
Recalling Euler's formula for graphs on plane: 

inA)|-|A| + |F(A)|=2, (C.2) 

where F{A) is the faces of A which includes the 'unbounded one', we see that 
we are seeking A such that |-F(A)| is maximal, i.e., there is maximum number 
of faces. By intuition, we conjecture that this happens when A consists of all 
the edges in an almost square, by which we mean V{A) is an r x r or r x (r + 1) 
rectangular grid or the inbetweens (e.g., an r x r grid of pixels to which 1 < j < r 
pixels have been added on one side). These considerations lead to the following: 

Lemma 16. 

i 

khdifW >d - -~ 

fori> 5. 

Proof. For r > 2, we consider a subgraph corresponding to an r x r square (solid 
lines) and consider graphs obtained by adding additional edges in the fashion 
depicted in Figure C.9. 
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Figure C.9: Add dashed edges (from longer to shorter dashed) to r x r square subgraph (soHd 
Unes) . 
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Wc find that for the square A, |A| = 2(r^ — r) and |V(A)| = r^, for the graph 
A with one additional edge, |A| = 2(r^ — r) + l and |T^(A)| = + for the graph 
A with two additional edges, |A| = 2(r^ - r) + 2 and \ V{A)\ =r^ + 2, and for the 
graph A with three additional edges, |A| = 2(r^ — r) + 3 and |1^(A)| = r"^ + 2. In 
fact, we observe that two edges can be added while adding one additional vertex 
until A corresponds to r x (r + 1) rectangle. Summarizing all these, a graph A 
that is constructed as above, is contained r x (r + 1) rectangle (included), and 
contains rxr square; satisfies either |A| = 2(r^— r)+2j or |A| = 2(r^ — r)+2j + l, 
and |1^(A)| = + j + 1, for j = 1, . . . , r — 1 — this holds for j = r as well. 
(Here, the case |A| = 2(r^ — r) + 1 is not stated.) By a similar observation, we 
observe that a graph A that is constructed similarly as above, is contained in 
(r + 1) X (r + 1) square (included), and contains r x (r + 1) square; satisfies 
either |A| = 2r'^ - 1 + 2j or |A| =2r'^ -1 + 2j + 1, and \V{A)\ = + r + j + 1, 
for j = 1, . . . , ) — this holds for j = r + 1 as well. 

The above observation leads to the following inequalities — which we conjec- 
ture to be in fact equalities: 

KnoiP (2(r2 -r) + 2j)>d-{r^+j), j = 1, . . . , r, 
KnDip(2(r' -r)+2j + l)>d- (r^ + j), j = 1, . . . , r, 

«S7oiF(2r' - 1 + 2j) > d - (r^ + r + j), .7 = 1, . . . , r + 1, 
«OD:p(2r2 - 1 + 2j + 1) > d - (r^ + r + j), j - 1, . . . , r + 1. 

We will now express these in a simpler form in terms of |A| = In the first 
case, letting £ = 2{r^ — r) + 2j, we have 

d-{r^ +j)=d - --r. 

Since 

2{r^ -2r+l)< 2{r^ - r + 1) < £ < 2r^ , 



we have r — 1< ^ ^<r. Hence, we can write khdif {(-) ^ d — ^ — \I ^ — 1. The 
other three cases can be treated similarly and we obtain 



k^dipW - 2 ~ V 2' 

k^difI^) - '^~2~V2"2' 

i [J 
k^dipW >d---\J- 



Therefore, for all £ > 5, we have kodif(^) — ^~2~\/i~^- '-' 



Appendix D. Discussion on the analysis exact recovery condition 

Wc observe that the analysis ERC condition (25) is not sharp in general, 
especially for the redundant 12. In the case of GAP, tracing the arguments 
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of Lemma 9 and Theorem 10, we conclude that in order for (25) to be sharp, 
there must exist a cosparse signal Xq such that f2A»xo matches the exact sign 
pattern of the row of (NnJ)'fNnJc with the largest ^i-norm and is of constant 
magnitude in absolute value. We remind that xq is the initial estimate that 
appears in the algorithm. Since the collection of CI\c±q may not span the 
whole , especially when ft is over- complete, it is unreasonable to expect the 
existence of such an xq. Similarly, in the case of analysis £i, we know that 
(25) is obtained from (24) in a crude way without taking into account the sign 
patterns of Oa^xa, which is not sharp in general for redundant ft. 

Average case performance guarantees?. Can we think of a way to obtain a more 
realistic success guarantee? We have a partial answer for this question in the 
sense that we can derive a condition — which is not a guarantee that reflects 
empirical results more faithfully. The idea is, instead of obtaining an upper 
bound of the left hand side of (24) by disregarding (or considering the worst 
case of) sign patterns, to model the effects of the sign patterns by estimating 
the size of the left hand side in terms of the maximum ^2-norm of the rows 
of (NfiJ)^ Nrijc (up to some constants). Though further investigation is de- 
sirable, we have empirically observed that the condition derived in this way 
reflected better the success rates of GAP and ^i-minimization. 

Desirable properties for r2 and M. At this point, one may ask a practical ques- 
tion: what are desirable properties of J2 and M that would help the perfor- 
mance of GAP or ^i-minimization? Can we gain some insights from our the- 
oretical result? For this, wc look for scenarios where the entries of Rq := 
J7aN^ (NJ7JJ7aN^)~^ NJlJc are small (hence, it is likely that condition (25) 

is satisfied). We start with the inner expression (NrjJflAN^) . The larger 
the minimum singular value of NfJ^JlAN-^, the smaller the entries of Rq. First, 
assuming that the rows of f2 are normalized, we note that the minimum singular 
value is larger when the size A is larger. Second, the closer the minimum sin- 
gular value is to the maximum one (this is in some sense an RIP-likc condition 
for r2), the larger it is. These two observations tell us that $7 should have high 
linear dependencies (to allow large cosupport A) and the rows of should be 
close to uniformly distributed on S''"-'^. 

Suppose that 17 has the properties described above. Then, Rq is well ap- 
proximated by Ri := 7riAN-^NriJc for some 7 > 0. Therefore, we ask when 
the entries of JIaN-^NJI^c are small. Each entry of JIaN-^NJIJc can be guar- 
anteed to be small if a) N satisfies an RIP condition for the space spanned by 
two rows of n and the rows of fi are incoherent. In summary, it is desirable 
that: 

• The rows of n are close to uniformly distributed in S*^"^. 

• n is highly redundant and have highly linearly dependent structure. 

• M is 'independent' from fi. This has to do with the RIP-like properties. 
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• The rows of ft are incoherent. 

• The cosparsity £ are large. 

Remark 17. The 2D finite difference operator Odif may he considered inco- 
herent even though the coherence is relatively large (1/4:). This is because the 
majority of pairs of rows of I^dif are in fact uncorrelated. 

Heuristic comparison of success guarantees for analysis-i^ and GAP. We point 
out that one can obtain from (26) a condition for the GAP that is similar to 
(24). For this, we observe from (26) that 

llnAXolloo = ||(NJ2X)tNf2Xef2AcXo||oo = ||[nAcN^(nAN^)t]^nAcXo||cx,- 

Since ||riAXo||oo < ||f^A<:Xo||oo is the necessary and sufficient condition for the 
(one-step) success of the GAP, we can derive a necessary and sufficient condition: 

||[nAcN^(nAN^)t]^nAcxolloo < linAcxolU 

where xq is varied over all signals with cosupport A and xq is the signal resulting 
from the first step of GAP . The above condition can be rewritten in a form 
similar to (24): 

sup ||[nAcN^(nAN^)1'^(sign(nAcXo) v)|U < 1 (D.l) 

where v := |nA<:XQ|/||nA<=xo||ooj i-e., v is obtained from f2A<=xo by taking 
element-wise absolute values and normalizing it to a unit ^oo-norm, and de- 
notes the element-wise multiplication of vectors. Condition (D.l) and (24) are 
in a similar form, but there are two differences between the two: First, for (D.l), 
the signal xq that apears is not in general a vector with cosupport A. It is rather 
a signal that arises from an approximation. Second, there is a 'weight' vector 
V in (D.l). One can heuristically deduce that such a v favours condition (D.l) 
to hold true since the size of most entries of v likely be smaller than 1. Beside 
these differences, one should keep in mind that condition (D.l) is only for one 
step. 
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